41 research outputs found

    On certain limitations of recursive representation model

    Get PDF
    There is a strong research e ort towards developing models that can achieve state-of-the-art results without sacricing interpretability and simplicity. One of such is recently proposed Recursive Random Support Vector Machine (R2SVM) model, which is composed of stacked linear models. R2SVM was reported to learn deep representations outperforming many strong classifers like Deep Convolutional Neural Network. In this paper we try to analyze it both from theoretical and empirical perspective and show its important limitations. Analysis of similar model Deep Representation Extreme Learning Machine (DrELM) is also included. It is concluded that models in its current form achieves lower accuracy scores than Support Vector Machine with Radial Basis Function kernel

    Analysis of compounds activity concept learned by SVM using robust Jaccard based low-dimensional embedding

    Get PDF
    Support Vector Machines (SVM) with RBF kernel is one of the most successful models in machine learning based compounds biological activity prediction. Unfortunately, existing datasets are highly skewed and hard to analyze. During our research we try to answer the question how deep is activity concept modeled by SVM. We perform analysis using a model which embeds compounds' representations in a low-dimensional real space using near neigh- bour search with Jaccard similarity. As a result we show that concepts learned by SVM is not much more complex than slightly richer nearest neighbours search. As an additional result, we propose a classi cation technique, based on locally sensitive hashing approximating the Jaccard similarity through minhashing technique, which performs well on 80 tested datasets (consisting of 10 proteins with 8 di erent representations) while in the same time allows fast classi cation and ecient online training

    Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function

    Full text link
    We demonstrate that in residual neural networks (ResNets) dynamical isometry is achievable irrespectively of the activation function used. We do that by deriving, with the help of Free Probability and Random Matrix Theories, a universal formula for the spectral density of the input-output Jacobian at initialization, in the large network width and depth limit. The resulting singular value spectrum depends on a single parameter, which we calculate for a variety of popular activation functions, by analyzing the signal propagation in the artificial neural network. We corroborate our results with numerical simulations of both random matrices and ResNets applied to the CIFAR-10 classification problem. Moreover, we study the consequence of this universal behavior for the initial and late phases of the learning processes. We conclude by drawing attention to the simple fact, that initialization acts as a confounding factor between the choice of activation function and the rate of learning. We propose that in ResNets this can be resolved based on our results, by ensuring the same level of dynamical isometry at initialization

    Active learning of compounds activity : towards scientifically sound simulation of drug candidates identification

    Get PDF
    Abstract. Virtual screening is one of the vital elements of modern drug design process. It is aimed at identification of potential drug candidates out of large datasets of chemical compounds. Many machine learning (ML) methods have been proposed to improve the efficiency and accuracy of this procedure with Support Vector Machines belonging to the group of the most popular ones. Most commonly, performance in this task is evaluated in an offline manner, where model is tested after training on randomly chosen subset of data. This is in stark contrast to the practice of drug candidate selection, where researcher iteratively chooses batches of next compounds to test. This paper proposes to frame this problem as an active learning process, where we search for new drug candidates through exploration of the compounds space simultaneously with the exploitation of current knowledge. We introduce the proof of concept of the simulation and evaluation of such pipeline, together with novel solutions based on mixing clustering and greedy k-batch active learning strategy

    Generative models should at least be able to design molecules that dock well : a new benchmark

    Get PDF
    Designing compounds with desired properties is a key element of the drug discovery process. However, measuring progress in the field has been challenging due to the lack of realistic retrospective benchmarks, and the large cost of prospective validation. To close this gap, we propose a benchmark based on docking, a widely used computational method for assessing molecule binding to a protein. Concretely, the goal is to generate drug-like molecules that are scored highly by SMINA, a popular docking software. We observe that various graph-based generative models fail to propose molecules with a high docking score when trained using a realistically sized training set. This suggests a limitation of the current incarnation of models for de novo drug design. Finally, we also include simpler tasks in the benchmark based on a simpler scoring function. We release the benchmark as an easy to use package available at https://github.com/cieplinski-tobiasz/smina-docking-benchmark. We hope that our benchmark will serve as a stepping stone toward the goal of automatically generating promising drug candidates

    Analysis of Compounds Activity Concept Learned by SVM Using Robust Jaccard Based Low-dimensional Embedding

    Get PDF
    Support Vector Machines (SVM) with RBF kernel is one of the most successful models in machine learning based compounds biological activity prediction. Unfortunately, existing datasets are highly skewed and hard to analyze. During our research we try to answer the question how deep is activity concept modeled by SVM. We perform analysis using a model which embeds compounds’ representations in a low-dimensional real space using near neighbour search with Jaccard similarity. As a result we show that concepts learned by SVM is not much more complex than slightly richer nearest neighbours search. As an additional result, we propose a classification technique, based on Locally Sensitive ashing approximating the Jaccard similarity through minhashing technique, which performs well on 80 tested datasets (consisting of 10 proteins with 8 different representations) while in the same time allows fast classification and efficient online training
    corecore